Generalizing Automatically Generated Selectional Patterns
نویسندگان
چکیده
Frequency information on co-occurrence pa t te rns can be att tomatically collected from a syntactically analyzed corpus; this information can then serve as the basis for selectional constraints when analyzing new text; from the same domain. Tiffs information, however, is necessarily incomplete. We report on measurements of the degree of selectional coverage obtained with ditt\~rent sizes of corpora. We then describe a technique for using the corpus to identify selectionally similar terms, and for using tiffs similari ty to broaden the seleetional coverage for a tixed corpus size. 1 I n t r o d u c t i o n Selectional constraints specify what combinat ions of words are acceptable or meaningful in particular syntactic relations, such as subject-verb-object or headmodifier relations. Such constraints are necessary for the accurate analysis of natural language text+ Accordingly, the acquisition of these constraints is an essential yet t ime-consuming par t of port ing a natural language system to a new domain. Several research groups have a t t empted to au tomate this process by collecting co-occurrence pa t te rns (e.g., subject-verb-ol)ject patterns) from a large t ra ining corpus. These pat terns are then used as the source of seleetional constraints in attalyzing new text. The initial successes of this approach raise the question of how large a t raining corpus is required. Any answer to this question must of course be relative to the degree of coverage required; the set of selectional pat terns will never be 100% complete, so a large corpus will always provide greater coverage. We a t t empt to shed to some light on this question by processing a large corpus of text from a broad domain (business news) and observing how selectional coverage increases with domain size. In many cases, there are l imits on the amount of t raining text, available. We therefore also consider how coverage can be increased using a tixed amount of text. The most s traightforward acquisition procedures build selectional pat terns containing only the specific word combinat ions found in the t raining corpus. (areater coverage can be obtained by generalizing fl'om the pat terns collected so tha t pat terns with semantically related words will also be considered acceptable. In most cases this has been (lotto using manually-created word classes, generalizing fi'oul specific words to their classes [12,1,10]. If a pre-existing set of classes is used (as in [10]), there is a risk tha t the classes awdlable may not match the needs of the task. If classes are created specifically to capture selectional constraints, there lnay be a substant ial manual I>urden in moving to a new domain, since at least some of the semantic word classes will be domain-specillc. We wish to avoid this manual component by auto: mari ta l ly identifying semantically related words. This can be done using the co-occurrence data, i.e., by idea: tifying words which occur in the same contexts (for example, verbs which occur with the same subjects and objects). From the co-occurrence data o110 Call coiil.pute a similarity relation between words [8,7]. This similarity information can then be used in several ways. One approach is to form word clusters based on this similarity relation [8]. This approach was taken by Sekine et al. at UMIST, who then used these chlsters to generalize the semantic pat terns [11]. l 'ereira et al. [9] used a variant of this approach, "soft clusters", in which words can be members of difl'erent clusters to difl'eren t degrees. An al ternat ive approach is to use the word similarity information directly, to inDr information about the likelihood of a co-occurrence pat tern from information abont pat terns involving similar words. This is the approach we have adopted for our current experiments [6], and which has also been employed by 17)agan et al. [2]. We corl:lttttl;e from the co+occurrence data a "confitsion matr ix" , which measures the interchangeability of words in particular contexts. We then use the confllsion matr ix directly to geueralize the selllantic patterns. 2 A c q u i r i n g S e m a n t i c P a t t e r n s Based on a series of experitnents over tile past two years [5,6] we have developed the following procedure
منابع مشابه
Generalizing over Lexical Features: Selectional Preferences for Semantic Role Classification
This paper explores methods to alleviate the effect of lexical sparseness in the classification of verbal arguments. We show how automatically generated selectional preferences are able to generalize and perform better than lexical features in a large dataset for semantic role classification. The best results are obtained with a novel second-order distributional similarity measure, and the posi...
متن کاملSmoothing of Automatically Generated Selectional Constraints
Department of Computer Science New York University New York, NY 10003 ABSTRACT Frequency information on co-occurrence patterns can be automatically collected from a syntactically analyzed corpus; this information can then serve as the basis for selectional constraints when analyzing new text from the same domain. Better coverage of the domain can be obtained by appropriate generalization of the...
متن کاملAcquisition Of Selectional Patterns
1 The Problem For most natural language analysis systems, one of the major hurdles in porting the system to a new domain is the development of an appropriate set of semantic patterns. Such patterns are typically needed to guide syntactic analysis (as selectional constraints) and to control the translation into a predicate-argument representation. As systems are ported to more complex domains, t...
متن کاملDetecting novel metaphor using selectional preference information
Recent work on metaphor processing often employs selectional preference information. We present a comparison of different approaches to the modelling of selectional preferences, based on various ways of generalizing over corpus frequencies. We evaluate on the VU Amsterdam Metaphor corpus, a broad corpus of metaphor. We find that using only selectional preference information is enough to outperf...
متن کاملSemantic Selectional Restrictions for Disambiguating Meronymy Relations
In this paper, we present an unsupervised approach to automatically learn lexico-syntactic patterns encoding meronymy relations from texts. Our major contribution lies in alleviating the challenge of disambiguating polysemous patterns that encode meronymy only in some contexts. We rely on the linking theory to posit that semantic features of the Part and Whole instances participating in a meron...
متن کامل